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Abstract 

In this note we study the existence of a solution to the survey-propagation equations for 
the random K-satisfiabiUty problem for a given instance. We conjecture that when the 
number of variables goes to infinity, the solution of these equations for a given instance 
can be approximated by the solution of the corresponding equations on an infinite tree. 
We conjecture (and we bring numerical evidence) that the survey-propagation equations 
on the infinite tree have an unique solution in the suitable range of parameters. 

1 Introduction 

Recently many progresses 0] have been done on the analytic and numerical study of the 
random K-satisfiability problem 0, ^, ^, using the approach of survey-propagation that 
generalizes the more old approach based on the "Min-Sum" [| algorithm ^ ^, ^, |TU| . 



The aim of this note is to sketch a possible path to a proof of the existence an uniqueness of 
the survey-propagation equations for a given instance of the problem. Before presenting the main 
arguments, for reader convenience I will present an heuristic derivation of the survey-propagation 
equations in section 2; the full derivation can be found in the original papers [0, H, |ll|, [12|, [T^. 
In section 3, I will present the main sequence of conjectures that may lead to the proof of 
the existence and uniqueness of the survey-propagation equations in the appropriate range of 
parameters. Finally I will present some conclusions. 

^The "Min-Sum" is the the zero temperature limit of the "Sum-Product" algorithm and sometimes is also 
called belief propagation. In the statistical mechanics language |^ the belief propagation equations are the 
extension of the TAP equations for spin glasses and the survey-propagation equations are the TAP equations 
generalized to the broken replica case. 



1 



2 A fast heuristic derivation of the survey equations 



2.1 The random K-sat problem 

In the random K-sat problem there are N variable a{i) that may be true of false (the index i 
will sometime called a node). An instance of the problem is given by a set of M = aN clauses. 
Each clause is characterized by set of three nodes (^l,^2^ is), that belong to the interval 1 — 
and by three Boolean variables (6i,&2; ^s)- In the random case the i and b variables are random 
with flat probability distribution. Each clause c is true if the expression 

= {a{tl) XOR bl) OR (a(z^) XOR b^) OR {a{il) XOR b^) (1) 

is true. The problem is satisfiable iff we can find a set of the variables a such that all the clauses 
are true. The entropy [|l^ of a satisfiable problem is the logarithm of the number of the different 
sets of the a variables that make all the clauses true. 

To a given problem we can associate a graph (the factor graph ^) where the nodes are 
connected to the clauses (3a in average) and each clause is connected to three nodes. The 
properties of this graph play a very important role. 

The goal of the analytic approach consists in finding for given a and for large values of the 
probability that a random problem (i.e. a problem with random chosen clauses) is satisfiable. 
The — 1 law ^, [T^ is supposed to be valid: for a < ac all systems (with probability one 
when A^ goes to infinity) are satisfiable and their entropy is proportional to A^ with a constant of 
proportionality that does not depend on the problem. On the other hand, for a > no random 
system (with probability one) is satisfiable. An heuristic argument has been given [0, |^ that 
suggest that ac = a* ^ 4.27 where a* can be computed using the survey-propagation equations 
defined later. There is already an incomplete proof |T0] that a* is a rigorous upper bound to ac- 



2.2 The behef propagation equations 

In the following analysis it will be crucial that in limit N —>■ oo the problem becomes locally 
trivial: this makes possible the computations of ac and of the other properties of the system. 
Let us be more precise. We can define a distance among the A^ nodes in the following way: 

• Two nodes are at distance 1 if there is a clause that contains both of them. At large A^ the 
number of nodes at distance 1 from a given node has a Poisson distribution with average 
6a. 

• Two nodes are at distance k if they are not at a distance k — 1 and there is a chain of k 
overlapping clauses that touch both of them (or equivalently the second node is at distance 
1 from a node at distance k — 1 from the first node). 

In the limit N ^ oo the set of nodes at distance k from a given node form a tree, without closed 
loops: locally the system looks like a tree. The solution of the K-sat problem on a tree (with 
given boundary conditions) can be trivially done: the belief propagation algorithm (defined 
later) is exact. When A^ goes to infinity the systems is not a tree (with probability 1) and this 
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fact makes hard to find an actual solution of the problem for a given instance; it may even 
destroy the global existence of a solution to the belief propagation equations. However the local 
treeness of the problem is enough to allow an analytic treatment. 

Let us take a large system for a < and let us consider the set C of all configurations 
that satisfies all the clauses. Our first task is to compute the probabilities Pt(0 ^^cl 
that the variable o{i) is true or false (obviously Pt(^) = 1), if belong to a random 

configuration in C. 

The presence of a simple local structure allows us to write down simple local equations 



TT| , |T^, [T^. In the case of belief propagation equation ^ one proceed as follows. For each 
clause that contains the node i (we will use the notation c G i although it may be not the most 
appropriate) Pt(^, c) is the probability that the variable o{i) would be true in absence of the 
clause c. If the node i\ were contained in only one clause, we would have that 

PF{il) = l-UT{i,c) , (2) 

where ut is an appropriate function that is defined by the previous relation. An easy computation 
shows that when all the b are false, the variable cr(z^) must be true if both variable ^"(ij) and 
are false, otherwise it can have any value. Therefore we have in this case that 

uA^, c) = PFirl. c)pF{rl c) + ^ " pH^-.^cK(^^, c) 

In a similar way, if some of the b variable are true, we should exchange the pp with the pt for 
the corresponding variables. Finally we have that 



Pt{i,c) 



Pf[i,c) 



Z{i,c) 
Udei,dj^c UF{i,d) 
Z(i, c) 



Z{i,c)= n Mhd)+ n Mhd). (4) 

d£i,dj^c dGi,d^c 

In total there are 3M variables pT{i,c) and 3M equations. These equations are called in 



the literature under different name (e.g. belief propagation, TAP equations |T^) and we naively 
expect that these equations (belief propagations) are satisfied (or quasi-satisfied, i. e. corrections 
of order can be present [0). For the time being let us suppose that such a solution exists 
and it is unique. In such case we expect that 

Pt[i) = 



Z{{) 

Yld&UFii) 



m 

Z{l) = ^UT{lA)'r^UF{lA)- (5) 
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We note the previous formulae can be written in a more compact way if we introduce a two 
dimensional vector p, with components pt and pp. We define the product of these vector 



ct = ax &T cp = dp bp, (6) 



if c = a ■ 6. 

If the norm of a vector is defined by 



ut + ap , (7) 



we finally find that 

P[l, c) 



For each clause c we can define the probability that the clause would be satisfied in a system 
where the clause is not present. We will denote this probability by Z{c). One finds that in the 
case where all the b variables are false 

Z{c) = l-pp{^^,)pp{^,)pp{^,) . (9) 

Finally the total entropy (apart correction that are subleading when goes to infinity) is given 
by 

S = ~Y: log(^(0) + 2 E log(^(c)) • (10) 



2.3 The survey propagation equations 



One can argue that the situation is more complex [g, |1^, [T^ . The belief propagation algorithm 
works at low value of a but it fails when a becomes too large. The previous equations may have 
multiple solutions or quasi-solutions that are very different one from the other. 



This fact has been interpreted |1|, ^ |1^, |T3| in the following way: the set of all configurations 
C that satisfy all the formulae can be divided into many sets that are well separated one by 
the others (this sets are sometimes called states in statistical mechanics ||^ or lumps [T^). The 
previous belief equations correspond to the probability restricted to one given set. 

The new picture is the following: we have an exponential large number of solutions (or quasi- 
solutions) of the belief equations (we call this set B) and we would like to know this number 
(i.e. the exponential of the complexity S(a)). The total number of configurations that satisfies 
all the clauses is given by 

exp(S'(a)) = exp(S(a) + S's(a)) (11) 

where exp(S's(a)) is the total number of configurations that satisfy all the clauses in a generic 
state. 

One expects that S(a) vanishes at ac, so that it computation is extremely important. In 
order to reach this goal we can mimic the steps that we have done for going from the variables 
a to the probability prii)] however in this case we are going to play the game at an higher level 
of abstractions, where states (or quasi-solutions of the belief equations) play the same role of a 
single configuration of the a{i) in the previous approach. 
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The new quantity is the full survey probability i.e. a function of p that is defined at 

each node that is equal to the probability of finding a solution (or a quasi-solution) of the belief 
equations with •p{i) = p. In other words s{i\p) an indirect probability, i.e. a probability of a 
probability. 

We introduce the quantity s{i, c\p) that is the distribution probability for the probability (i.e. 
s{i\p)) when the clause c is removed. The equations for the full survey probability s{i,c\p) can 



be obtained using the techniques of |jT2|, 0, however we will not consider them here. Indeed we 
need them if our aim is to compute the total entropy (or Ss{a)). If our aim is more modest and 
we want to compute only S(a) and consequently Oc, a simple approach is possible ^ ^ pTSf . 

The crucial observation is that a given solution of the belief equations may have pT{i,c) = 1 
or prii, c) = or < prih c) < 1. 

We can coarse grain the probability distribution of the beliefs by introducing the quantity 
ST{i, c) that is defined as the probability of finding pxii, c) = 1, in the same way sp{i, c) as the 
probability of finding pT{i,c) = and s/(i, c) is the probability of finding < pT{i,c) < 1. As 
discussed in |I], 0], by considering the equations for only these coarse grained surveys, it is 
possible to compute the complexity E and the value of etc- 

We can use a more compact notation by introducing a three dimensional vector s given by 

s = {st,si,sf} ■ (12) 

Everything works as before with the only difference that we have a three component vector 
instead of a two component vector. Generalizing the previous arguments one can introduce the 
quantities u{i, c) that is the value that the survey at i would take if only the clause c would be 
present in i. In the case where all the b are false, a simple computation gives 

U{i, C) = {sf(22, c), Sf(«3, c), 1 - SF{i2, c), Sf(«3, c), 0} . (13) 

The formula can generalized as before to the case of different values of b. One finally finds 

Pih c) = |„ ' ^ , (14) 

where we have defined product in such a way that 

ab = {aT&T + CLibr + arbi, ajbi, ap bp + ai bp + ap (15) 

The previous equations are the survey propagation equations (as defined in 0]) and the reader 
can find there the details of the derivation. 

If one finds a solution to the survey equation one can compute the survey probability as 

I Y{d&iU[t,d)\ 



^It always happens that the vector u has only one zero component [utUf = 0). This fact may be used to 
further simplify the analysis. 
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Figure 1: An example of a graph with loops. 



and the total complexity is given by 



E= - 



^log(Z(i) + 2^1og(Z(c)) 



(17) 



c 



where now 



Z{t) = ln{\l[u{t,d)\), Z{c)=H\s{t,c)u{t,c)\) 



(18) 



3 The infinite rooted tree and some conjectures 

In the previous section we have sketched an heuristic derivation of the survey equations: it 
should be clear that these equations should be taken as they are and there is no warranty of any 
kind, either expressed or implied, that the belief equations (that have been used heuristically to 
construct the survey equations) do have a global solution. 

The aim of this note is not to present a more precise derivation of the survey equation, but to 
discuss the problem of finding solutions to the survey propagation equations: this is a well posed 
mathematical problem independently from the origin of these equations. If the survey equations 
would have no solutions, the previous arguments would be empty or if survey equations had an 
exponential number of solutions, we should go up to an higher level of abstraction. 

Numerical experiments on systems of size up to = 10^ show that in the interval q;l < a < 
au (ttL and au asymptotically do not depend on the system, they are near to 3.9 and 4.36 for 
large N), the survey equations do have one solution that is obtained by iterations. For a < aL 
the survey equation converge to the trivial solution si{i,c) — 1. On the other end for a > au 
the iterative equations do not converge. Moreover the difference among a solution for the survey 
propagation equations and those for a perturbed survey propagation equation (e.g. by adding 
or removing a clause, or by fixing a survey to an arbitrary value) diverges when we approach au 
from below. The complexity change sign at a value a* such that au > a* > aL- 
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Figure 2: The central part of the infinite tree associated to the example shown in fig. (p. The 
numbers near the nodes are the values of the function M evaluated at that node. 

3.1 The infinite tree 

These results call for an analytic proof. The aim of this note is to suggest a possible approach. 

We propose to generalize the construction that Aldous has recently used in the study of the 
random matching problem [jl8[. For any node z of a given problem for finite N we associate an 
infinite tree rooted in i that is constructed in the following way 0. Let us denote by 7 a node of 
the infinite tree. We assume that there is a function M(7) that maps the nodes of the infinite 
tree onto the nodes of the original problems in such a way that if 71, 72 and 73 belong to the 
same clause, also the three nodes ik = M(7fc) belong to the clause, (the variables bk of the two 
clauses have the same values). We can further impose that the number of nodes at distance one 
from 7 is equal to the number of nodes at distance one from M {'-/). The construction is simpler 
that it may looks. In the cases of a problem with four variables and clauses that involve only 
two sites (i.e. a 2-sat, not 3-sat for graphical convenience) the original graph is shown in figure 
|I|, while the center of the infinite graph, rooted in 0, is shown in fig. (^. 

The construction of the infinite rooted tree is the simplest way to project a finite graph on a 
tree preserving as much as possible the structure of the original graph. The fact that for large 
N the original problem does not contain small loops implies that the original problem is locally 
very similar to a subset of the infinite tree. This fact suggest that the solution on the K-sat 



■^Sometimes the tree is not infinite, e.g. if tlie site i has no neighbour, however for not too smaU a the tree is 
infinite in most of the cases. If no loops were present (a rather unlikely possibility for large N) in the original 
graph the infinite rooted tree would be identical to the original graph. 
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problem on the infinite tree may give us information on tlie solution of the K-sat problem on 
the original problem. 

An other object that we can construct is a random infinite tree. It is a random tree where 
the number of nearest nodes has a Poisson distribution. It is evident that for large the infinite 
tree associated to a given random problem becomes locally very near to a random infinite tree: 
the first contains correlations that vanish when N goes to infinity. Moreover the properties the 
infinite random tree can be studied analytically. One would like to compare the properties of a 
given problem with the properties of the associated infinite tree; the hope is that the infinite tree 
associated to a random problem with N clauses should become similar to the random infinite 
tree when N goes to infinity. 

We will say that a problem on the infinite tree has a unique solution iff, when we impose a 
generic boundary conditions one the k + 1 shell, the behaviour at the center of the tree does not 
depend on the boundary conditions with probability one when k goes to infinity. 

The problem of computing a solution that satisfies all the clauses cannot have a unique solu- 
tion in the previous sense: the entropy has a term proportional to and there is an exponentially 
large number of different solutions. 

The belief propagation equations have an unique solution for a given belief on the boundary 
(quasi solutions fade away if closed loops are absent). An explicit computation show that when 
a is greater than a critical value (around 3.9) [Q, ^ the solution of the belief does depend on the 
boundary. One could also argue that if the solution of the belief equations would be independent 
from the boundary conditions when k goes to infinity, there should be an essential unique solution 
of the belief equation for a given problem and for sufficiently large a this does not happen 
(i.e. Aldous essential uniqueness property fails for the belief equations). 

3.2 Three conjectures 

We have now to exclude the possibility that the survey propagation equations do have a stable 
solution on the rooted tree near ac- This is an highly non-trivial requirement that fails in other 
models or for other forms of survey equations in the K-sat problem. 

If the approach of 0] is correct the following conjectures should be true: 

1. For a < au and large A^ the infinite tree associated to random problem has one stable 
solution with probability one. If this happens, the corresponding survey probabilities will 
be denoted by Soo{i, c). 

2. For large A^ the survey probabilities Soo(i,c), should be an approximate solution of the 
survey probability equations of the original problem. In other words for a given sample 
and a < au there is solution of the survey probability equations near to Soo(^, c). 

3. The value of au is determined by the following condition: for a < au the infinite random 
tree has only one stable solution, while this does not happens a > au- 

A direct numerical test of these conjectures is not easy especially in the interesting region 
where a is near to 4. In principle it is possible to construct in an explicit way the first k shells 
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of the infinite tree and to study what happens for large k. Unfortunately the number of first 
neighbour nodes is 6a so that for a = 4 the k^^ shell contains of the order 24^ nodes, that is 
a very large number for numerical analysis already for A; = 5. If is not much greater than 
24^, there will be many repetitions of the same nodes in the first five shells of the tree and on 
such scale the original problem does not looks very tree like. I have done studied numerically 
problems up to = 10^ and k = 4 and the data (e.g. at a = 4) are consistent with the first 
two conjectures although it is difficult to arrive to convincing evidence. 

If we accept also the last conjecture, we can compute the value of au on the infinite random 
lattice and we can compare it with the numerical estimates for a given problem, i.e. au ~ 4.36. 
At this end we must study the survey equations on the infinite random tree. In this case it is 



natural to suppose a translational invariance property p, |T^, 0. 

Let us call Vk{s) the probability distribution of the survey in a generic node at distance k 
form the origine 0. Translational invariance implies that Vk{s) does not depend on k: it will be 
denoted by V{s). This quantity plays a crucial role in the approach f\ 

3.3 A consistency check 

In principle it possible that this construction fails: the equations for the survey may have a 
solution that depends on the boundary condition. In such a case a more complex construction 
should be done ||T^ and the aim of this note is to exclude that this happens. 

The properties of V{s) can be well studied numerically and hopefully analytically. Indeed 
the surveys of the nodes on a shell at distance k from the origine can be expressed in terms of 
the surveys at the nodes at distance k + 1; using the supposed translational invariance of the 
probability distribution we get a consistency equation. 

The procedure is the following. We consider a node with z clauses, where z has a Poisson 
distribution with average 3a and we assume that the nearest nodes have the survey probability 
distributed according to V{s) and we compute the survey probabilities for the new node. If we 
average on z and on the random clauses we get a new survey probability P(s), that obviously 
depends on V{s). The equation for V{s) are simply 

n^ = v{s). (19) 



Using the techniques of [|I], 0, [T^, |T3[ one can also construct a functional $(P) such that the 
actual solution of the equation (|19D can be found by minimizing this functional (however we will 
not discuss this point). 

■*For each node i, we have to consider all the different surveys with one of the clauses removed, for simplicity 
we will not indicate the c dependence and in the following s{i) will be nickname for all the s{i, c). 

^It is convenient to recall the heuristic definition. of V{s). We decompose into states the set of the configuration 
that satisfies all the conditions. For each state we compute the belief probability that a given variable is true. The 
survey probability characterizes the distribution probability of the belief at a given site: the survey is a probability 
of a probability (an indirect probability). Finally P(s) is the probability of finding a site with that particular 
survey probability. In other words V{s) is a probability of a probability of a probability; in the simplifying case 
we are studying here is a function (we only care if a belief is equal to ±1 or not), while in the more general case 
it would be a functional. 
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It is a standard conjecture |jl2| that equations of the form (p^Of) can be solved by iteration 



(and this hkely follows from the convexity properties of $(7^)). If this is the case, we can use 
the method of population dynamics to find the solution of equation (plQl). The method is very 
simple and can be trivially implemented on a computer. It consists in describing a probability 
P by an ensample of L elements distributed according to this probability; the method becomes 
exact when L goes to infinity. 

We consider a set of L surveys. Starting from this set we generate a new ensemble of L 
surveys by using the standard procedure described in pi O]. The construction of an 



element of the new ensample is done as follow. We extract a Poisson distributed integer z with 
average 3a. We take 6z surveys extracted randomly from the L surveys and we extract random 
the b of the corresponding k clauses. Using equations (|T^JT5| ) we compute one of the surveys 
that belong the new ensemble. Finally by repeating this operation L times we obtain the new 
ensample. 

By iterating this procedure / time we find for large / a probability distribution on the surveys, 
that is / dependent. This procedure can be done also for large values of L (e.g. L = 10^) and 
the convergence is rather fast (the corrections seem to be proportional to 1/L for generic a). If 
the limits L to oo and I are smooth (the first should be done firstly) the resulting probability 
distribution is a solution of the equation ([T9|). 

In the same approach we can ask what happens in the population dynamics if we start from 
two different sets of surveys at the initial step. Let us indicate the i^^ survey at the iteration t 
with s{i,t). Let assume to run twice the population algorithm (with the same random number 
generator) but taking two different sets as starting points: si(i,0) and S2(i,0). We expect that 
for large i the probability distribution of the survey should be the same; however it is a not 
evident if for a given i 

Si(z,t)-S2(z,t) ^t^ooO . (20) 

It is natural to conjecture that if this happens, the survey equations have an unique solution 
on a random infinite tree. Indeed the computation of the surveys on the shell M — / as function 
of the survey on the shell M — / + 1, can be done exactly using the same algorithm we use in 
the population dynamics, with the only difference that the total number of surveys is constant 
in the population dynamics (i.e. it is equal to L) and it decreases with / on a three (the number 
is proportional to (6a)*^~^)). In the limits L and M going to infinity this difference should not 
be relevant. 

In order verify if equation (|2^) is true is convenient to define a distance D{t) as 

D{t) = S^=i.^l^"'i(M)- ^-2(^)1^ _ ^21) 
L 

1 have numerical studied the properties of D{t) for large L (up to L = 10^) finding little L 
dependence (as expected). I find that for a < au ^ 4.36 for large t: 

D{t) oc exp(-A(a)t) , (22) 
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Figure 3: The numerical results for the exponent A(a), defined in equation (p2|), as function of 
a. 

where A(a) is positive for a < au and vanishes at a = au- For a > au, but not too large both 
vecsi - and S2 go to the same probability distribution, but D{t) does not go to zero. In other 
words — A(a) is maximum Liapunov exponent: when it is negative the iteration converges to a 
fixed point, while when it is positive chaos is present. 

The estimated value of au, i.e. 4.36, is larger that the value where the complexity vanishes 
(i.e. 4.27), so that the previous conjectures should be correct in the interesting region of positive 
complexity, where there should be solutions of the satisfiability conditions that correspond to 
the solutions of the survey propagation equations. 

The reader should notice that the survey propagation equations do have a solution also 
for a > ac and the fact that the complexity turns out be negative is a warning that original 
satisfiability problem does not have any solution. In this case the heuristic construction of the 
surveys is empty. 

4 Conclusions. 

The main propose of this note is the construction (following Aldous [|18[) of the infinite rooted 
tree associated to given satisfiability problem. This infinite rooted tree plays the role of a bridge 
among a finite instance of the problems and the infinite random tree where analytic computations 
III, D, |12|, [1^ are done. It is argued that existence of an unique stable solution on the infinite tree 
(that apparently holds for a < au ^ 4.36) implies the existence of an unique stable solution of 
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the survey equations on a large system in the same range of a. 

This resuh imphes that the survey equations that have been used [|l], in an algorithm 
to find an actual solution of an instance of the K-sat problem do have a solution. However 
independently from the interest of this application of the survey equations, I believe that the 
conjectures that have been put forward have a mathematical interest in their own because they 
clarify the fundamental hypothesis behind the approach of ^ 0, ; eventually they can be 
used to prove similar results in other problems (like the ]9-spin model). 
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